Mass

In the previous blog post, we talked about visualising the locations where meteorites struck Earth and we made some conclusions on the locations where these meteorites were found. What we didn’t yet touch upon is the meteorite’s mass. We will continue to work with our cleaned meteorite data set, and we will set out to find out whether we can get some kind of mass distribution(s), and if we can use the meteorite mass in a global visualisation.

The packages which will be used are data.table, leaflet, ggplot2, maps, maptools, raster, dplyr and htmltools.

Meteorite mass visualisation

As an initial step, we can check the minimal and maximal meteorite mass to get an idea of the mass range.

met[, min(mass)]
## [1] 0
met[, max(mass)]
## [1] 6e+07

We get a mass range of 0 to 6e+07 g. It seems that something might have gone wrong in the minimal mass range, so for our further investigations, we will omit the meteorites with a mass of 0 g. To be sure we don’t throw away a significant number of meteorites, we first query our data to know how many objects have been recorded with zero mass. data.table’s .N functionality again provides a fast way to retrieve this information.

met[mass == 0, .N]
## [1] 18
met <- met[mass != 0]

To get an idea of the mass distribution, we will start with plotting a basic histogram. For this purpose, we will use the ggplot2 package. Plotting the histogram is then quite straightforward:

ggplot(data = met, aes(x = mass)) +
    geom_histogram() +
    xlab('Mass (g)') +
    ylab('Count')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can see that even though the mass has a large range, the utmost majority of all masses can be found on the lower end of the histogram. A possible circumvention to this problem, where we want to keep the large masses in sight but we still want to see more detail in the smaller masses, is to replace the x-axis by a logarithmic scale. This can be done in ggplot2 by adding one single line to the image definition.

ggplot(data = met, aes(x = mass)) +
    geom_histogram() +
    xlab('Mass (g)') +
    ylab('Count') +
    scale_x_log10()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This is exactly what we would like to see. It is a nice distribution with a long right tail, which might possibly be a Rayleigh distribution, but we lack too much knowledge on the subject to know for sure. To keep R from complaining, we can add the argument bins = 50 to the ggplot2::geom_histogram() function. For a cleaner plot we also add a ggplot2::theme() function.

ggplot(data = met, aes(x = mass)) +
    geom_histogram(bins = 50) +
    xlab('Mass (g)') +
    ylab('Count') +
    scale_x_log10() +
    theme(panel.background = element_rect(fill = NA))

A possible explanation for the steep fall-off in the lower bounds of the distribution can be given by the size of the meteorites located in this area. These are generally very small, and will not be seen easily. Another factor contributing is that a lot of meteorites of this size will have burnt up in the atmosphere.

It might be interesting to investigate whether both the “Found”- and “Fell”-categorised meteorites follow the same distribution. For this purpose, we can split the graph in two parts using the fall category. If we would use only one ggplot2::geom_histogram() function, we would create a stacked histogram. This is not what we want to see, so we use two function definitions, each with their own subset of the full data.

ggplot(data = met, aes(x = mass)) +
    geom_histogram(data = subset(met, fall == 'Found'), bins = 50, aes(fill = fall), alpha = 0.2) +
    geom_histogram(data = subset(met, fall == 'Fell'), bins = 50, aes(fill = fall), alpha = 0.2) +
    scale_fill_manual(name = 'Discovery', values = c('blue', 'red')) +
    xlab('Mass (g)') +
    ylab('Count') +
    scale_x_log10() +
    theme(panel.background = element_rect(fill = NA))

While there is a scaling factor in difference between the height of both distributions, it is visible that the shape of both distributions is different. The peak of the “Fell”-category meteorites has moved to the right. One could possibly explain it by using the aforementioned arguments about the size of the smaller meteorites. If we consider the most common meteorites of the total distribution, these have a mass of around 10 g. We can reason that meteorites of this size and smaller have a smaller chance of being seen when falling.

Global meteorite mass

An interesting statistic might be the average meteorite mass per country area. To visualise this purpose, let us follow the example guidelines on making choroplets on the leaflet for R github page, while making some necessary adaptations.

To prepare our data for visualisation, we first need to obtain the total meteorite mass for every country. We can do this with data.table’s functionalities where we sum the mass column for all country entries. We obtain a data.table object containing the masses per country. We will use this object for the visualisation, while we can use our old met data.table to provide possible metadata for the plot.

masses <- met[, sum(mass), by = country]
setnames(masses, 'V1', 'mass')
head(masses)
country mass
Germany 1956832.50
Denmark 58245.81
Canada 1328864.44
Mexico 72370561.40
Argentina 3862045.50
Pakistan 138956.60

To begin our visualisation, we need to get a map of the world where every country is represented by a polygon. We can do this using the maps package, and we get a map object mapWorld as our result. To create a SpatialPolygons object from our map object, we use the maptools::map2SpatialPolygons() function. If you remember the previous blog post, you know that some countries are represented in the format countryname:countrypart, where we want to extract the country name. Hence the part where we save the names under IDs. A next step consists of combining these polygons into a SpatialPolygonsDataFrame. We can achieve this by using the SpatialPolygonsDataFrame constructor to combine our SpatialPolygons object with the country ID’s we previously extracted. To add our constructed world data, we can merge the SpatialPolygonsDataFrame with our masses data.table, where we define for both data.frames on which column we want to merge. For both objects, this is the country column.

Note: while we did not explicitly load the sp package, this has been done automatically by loading our other packages.

mapWorld <- map('world', fill = TRUE, plot = FALSE)

IDs <- sapply(strsplit(mapWorld$names, ':'), function(x) x[1])
world <- map2SpatialPolygons(mapWorld, IDs = IDs, proj4string = CRS('+proj=longlat +datum=WGS84'))

world_df<- as.data.frame(sapply(slot(world, 'polygons'), function(x) slot(x, 'ID')))
row.names(world_df) <- sapply(slot(world, 'polygons'), function(x) slot(x, 'ID'))
world_SPDF <- SpatialPolygonsDataFrame(world, data = world_df)
names(world_SPDF) <- 'country'

world <- merge(world_SPDF, masses, by.x = 'country', by.y = 'country')

To get the total area for every country, we can search for an external data set, or we can choose to take an easier and less accurate approach, where we calculate every area in km^2 by using the raster::area() function to calculate the area of every country’s polygon. For our own comfort, let us choose the latter. Now that we have the total area and total meteorite mass for every country, it is easy to calculate the average mass per square kilometre for every country.

world$area_sqkm <- area(world) / 1000000
world$m_avg <- world$mass / world$area_sqkm

Now that our data is prepared, let us start constructing the actual visualisation using leaflet. The first step will be to plot the Earth where every country is given by its corresponding polygon.

leaflet(world) %>%
    setView(0, 0, zoom = 1) %>%
    addProviderTiles('CartoDB.Positron') %>%
    addPolygons()

To create colour bins for our data, we should first check in what range the mass per area lies. As not all countries are represented in NASA’s database, we will have some NA values introduced in the mass entry by the aforementioned merging step. These will transfer to NA values in the average mass entry. To get the actual mass range, we will have to omit these values in the minimum and maximum calculations.

min(na.omit(world$m_avg))
## [1] 0.0004658533
max(na.omit(world$m_avg))
## [1] 104.9483

We find that the average meteorite mass per area lies between 0 and 105 gram per square kilometre. Hence we can define our mass bins. Since we want enough divisions to map the different masses, we can not use the YlOrRd color palette as proposed by the Leaflet for R - Choropleths webpage, since this is limited to 9 bins. Hence we use the colorRampPalette() function to define our own palette. Let us also add a color legend with the addLegend() function.

bins <- c(0, 0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 6.4, 12.8, 25.6, 51.2, 102.4, Inf)
pal <- colorBin(colorRampPalette(colors = c('#ffeda0', '#ff4500', '#bd0026'))(12), domain = world$m_avg, bins = bins)

leaflet(world) %>%
    setView(0, 0, zoom = 2) %>%
    addProviderTiles('CartoDB.Positron') %>%
    addPolygons(fillColor = ~pal(m_avg),
                weight = 1.2,
                opacity = 1,
                color = 'white',
                fillOpacity = 0.7) %>%
    addLegend(pal = pal,
              values = ~m_avg,
              opacity = 0.7,
              title = NULL,
              position = 'bottomright')

Great, we are almost there! The last step will consist of constructing a label which gets displayed when a country is hovered over. Our choices of information to be displayed will be the country name, the average meteorite mass per area, the minimum and maximum meteorite size, the amount of meteorites that have been found and the percentage of “Fell” versus “Found” meteorites. We will use some functions of the dplyr package for this purpose.

country_data <- met %>% group_by(country) %>%
                    summarise(m_min := min(mass), m_max := max(mass), tot := n(), fell := sum(fall == 'Fell'))
country_data <- as.data.table(country_data)

world <- merge(world, country_data, by.x = 'country', by.y = 'country')

Now we can use the resulting SpatialPolygonsDataFrame to add information to the labels. These labels should be transformed to HTML-friendly text, which is why we use the htmltools::HTML() function.

labels <- sprintf('
                <div style="width:350px">
                    <strong>%s</strong><br/>
                    Average meteorite mass per km<sup>2</sup>: <span style="float:right">%0.4g g</span><br/>
                    Total number of meteorites: <span style="float:right">%d</span><br/>
                    Minimum meteorite mass: <span style="float:right">%0.4g g</span><br/>
                    Maximum meteorite mass: <span style="float:right">%0.4g g</span><br/>
        
                    <span style="float:left">Fell</span><span style="float:right">Found</span><br/>
                    <span style="color:#67a9cf;float:left">%0.4s%%</span>
                    <span style="color:#ef8a62;float:right">%0.4s%%</span><br/>
                    <span style="background:#67a9cf;width:%s%%;float:left">&nbsp;</span>
                    <span style="background:#ef8a62;width:%s%%;float:right">&nbsp;</span>
                </div>',
                world$country,
                world$m_avg,
                world$tot,
                world$m_min,
                world$m_max,
                100 * world$fell / world$tot,
                100 * (1 - world$fell / world$tot),
                100 * world$fell / world$tot,
                100 * (1 - world$fell / world$tot)) %>%
    lapply(htmltools::HTML)

As a last step, we can now add these labels and a hover functionality to the leaflet map to obtain our final result.

leaflet(world) %>%
    setView(0, 0, zoom = 2) %>%
    addProviderTiles('CartoDB.Positron', options = providerTileOptions(minZoom = 2)) %>%
    addPolygons(fillColor = ~pal(m_avg),
                weight = 1.2,
                opacity = 1,
                color = 'white',
                fillOpacity = 0.7,
                highlight = highlightOptions(weight = 2, color = '#666', fillOpacity = 0.7, bringToFront = TRUE),
                label = labels,
                labelOptions = labelOptions(style = list('font-weight' = 'normal', padding = '3px 8px'),
                                            textsize = '15px', direction = 'auto')) %>%
    addLegend(pal = pal,
              values = ~m_avg,
              opacity = 0.7,
              title = htmltools::HTML('Average mass in g/km<sup>2</sup>'),
              position = 'bottomright')